Overview

Dataset statistics

Number of variables17
Number of observations2032002
Missing cells2712023
Missing cells (%)7.9%
Duplicate rows6
Duplicate rows (%)< 0.1%
Total size in memory263.6 MiB
Average record size in memory136.0 B

Variable types

Categorical15
Numeric2

Alerts

ΓêÒΓòùΓõÉRegiao - Sigla has constant value "SE" Constant
Estado - Sigla has constant value "SP" Constant
Produto has constant value "ETANOL" Constant
Unidade de Medida has constant value "R$ / litro" Constant
Dataset has 6 (< 0.1%) duplicate rowsDuplicates
Municipio has a high cardinality: 127 distinct values High cardinality
Revenda has a high cardinality: 9618 distinct values High cardinality
CNPJ da Revenda has a high cardinality: 9925 distinct values High cardinality
Nome da Rua has a high cardinality: 5284 distinct values High cardinality
Numero Rua has a high cardinality: 3137 distinct values High cardinality
Complemento has a high cardinality: 747 distinct values High cardinality
Bairro has a high cardinality: 3652 distinct values High cardinality
Cep has a high cardinality: 5609 distinct values High cardinality
Data da Coleta has a high cardinality: 3691 distinct values High cardinality
Bandeira has a high cardinality: 142 distinct values High cardinality
Valor de Venda is highly correlated with Semestre and 1 other fieldsHigh correlation
Valor de Compra is highly correlated with Semestre and 1 other fieldsHigh correlation
Estado - Sigla is highly correlated with ΓêÒΓòùΓõÉRegiao - Sigla and 3 other fieldsHigh correlation
ΓêÒΓòùΓõÉRegiao - Sigla is highly correlated with Estado - Sigla and 3 other fieldsHigh correlation
Produto is highly correlated with Estado - Sigla and 3 other fieldsHigh correlation
Semestre is highly correlated with Valor de Venda and 1 other fieldsHigh correlation
Unidade de Medida is highly correlated with Estado - Sigla and 3 other fieldsHigh correlation
Complemento has 1741308 (85.7%) missing values Missing
Valor de Compra has 965061 (47.5%) missing values Missing

Reproduction

Analysis started2022-09-21 00:30:45.576873
Analysis finished2022-09-21 00:32:27.357115
Duration1 minute and 41.78 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

Semestre
Categorical

HIGH CORRELATION

Distinct36
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
2004-02
 
106032
2005-01
 
105316
2007-01
 
91080
2006-01
 
84959
2006-02
 
80790
Other values (31)
1563825 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters14224014
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-01
2nd row2020-01
3rd row2020-01
4th row2020-01
5th row2020-01

Common Values

ValueCountFrequency (%)
2004-02106032
 
5.2%
2005-01105316
 
5.2%
2007-0191080
 
4.5%
2006-0184959
 
4.2%
2006-0280790
 
4.0%
2005-0277088
 
3.8%
2007-0269732
 
3.4%
2008-0264453
 
3.2%
2014-0263522
 
3.1%
2013-0263111
 
3.1%
Other values (26)1225919
60.3%

Length

2022-09-20T21:32:27.449642image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2004-02106032
 
5.2%
2005-01105316
 
5.2%
2007-0191080
 
4.5%
2006-0184959
 
4.2%
2006-0280790
 
4.0%
2005-0277088
 
3.8%
2007-0269732
 
3.4%
2008-0264453
 
3.2%
2014-0263522
 
3.1%
2013-0263111
 
3.1%
Other values (26)1225919
60.3%

Most occurring characters

ValueCountFrequency (%)
05128535
36.1%
23289103
23.1%
12240762
15.8%
-2032002
 
14.3%
5278602
 
2.0%
6251194
 
1.8%
4243597
 
1.7%
7229273
 
1.6%
8210189
 
1.5%
9195947
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number12192012
85.7%
Dash Punctuation2032002
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
05128535
42.1%
23289103
27.0%
12240762
18.4%
5278602
 
2.3%
6251194
 
2.1%
4243597
 
2.0%
7229273
 
1.9%
8210189
 
1.7%
9195947
 
1.6%
3124810
 
1.0%
Dash Punctuation
ValueCountFrequency (%)
-2032002
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common14224014
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
05128535
36.1%
23289103
23.1%
12240762
15.8%
-2032002
 
14.3%
5278602
 
2.0%
6251194
 
1.8%
4243597
 
1.7%
7229273
 
1.6%
8210189
 
1.5%
9195947
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII14224014
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
05128535
36.1%
23289103
23.1%
12240762
15.8%
-2032002
 
14.3%
5278602
 
2.0%
6251194
 
1.8%
4243597
 
1.7%
7229273
 
1.6%
8210189
 
1.5%
9195947
 
1.4%

ΓêÒΓòùΓõÉRegiao - Sigla
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
SE
2032002 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4064004
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSE
2nd rowSE
3rd rowSE
4th rowSE
5th rowSE

Common Values

ValueCountFrequency (%)
SE2032002
100.0%

Length

2022-09-20T21:32:27.582626image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-20T21:32:27.733455image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
se2032002
100.0%

Most occurring characters

ValueCountFrequency (%)
S2032002
50.0%
E2032002
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter4064004
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S2032002
50.0%
E2032002
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4064004
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S2032002
50.0%
E2032002
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4064004
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S2032002
50.0%
E2032002
50.0%

Estado - Sigla
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
SP
2032002 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4064004
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSP
2nd rowSP
3rd rowSP
4th rowSP
5th rowSP

Common Values

ValueCountFrequency (%)
SP2032002
100.0%

Length

2022-09-20T21:32:27.897849image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-20T21:32:28.120440image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
sp2032002
100.0%

Most occurring characters

ValueCountFrequency (%)
S2032002
50.0%
P2032002
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter4064004
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S2032002
50.0%
P2032002
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4064004
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S2032002
50.0%
P2032002
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4064004
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S2032002
50.0%
P2032002
50.0%

Municipio
Categorical

HIGH CARDINALITY

Distinct127
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
SAO PAULO
301391 
CAMPINAS
 
58556
RIBEIRAO PRETO
 
50636
SANTO ANDRE
 
36754
SAO JOSE DOS CAMPOS
 
33787
Other values (122)
1550878 

Length

Max length23
Median length19
Mean length9.755570122
Min length3

Characters and Unicode

Total characters19823338
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGUARULHOS
2nd rowADAMANTINA
3rd rowADAMANTINA
4th rowADAMANTINA
5th rowADAMANTINA

Common Values

ValueCountFrequency (%)
SAO PAULO301391
 
14.8%
CAMPINAS58556
 
2.9%
RIBEIRAO PRETO50636
 
2.5%
SANTO ANDRE36754
 
1.8%
SAO JOSE DOS CAMPOS33787
 
1.7%
BAURU33542
 
1.7%
SOROCABA31952
 
1.6%
SAO JOSE DO RIO PRETO31769
 
1.6%
OSASCO31372
 
1.5%
SANTOS30609
 
1.5%
Other values (117)1391634
68.5%

Length

2022-09-20T21:32:28.349063image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sao477955
 
14.6%
paulo301391
 
9.2%
do89291
 
2.7%
preto82405
 
2.5%
jose73842
 
2.3%
ribeirao60071
 
1.8%
campinas58556
 
1.8%
rio55773
 
1.7%
da48390
 
1.5%
mogi46316
 
1.4%
Other values (148)1974001
60.4%

Most occurring characters

ValueCountFrequency (%)
A3828722
19.3%
O2191941
11.1%
S1442362
 
7.3%
R1392852
 
7.0%
I1344711
 
6.8%
1235989
 
6.2%
U1116950
 
5.6%
E977248
 
4.9%
T904232
 
4.6%
P846626
 
4.3%
Other values (14)4541705
22.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter18570365
93.7%
Space Separator1235989
 
6.2%
Other Punctuation16984
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A3828722
20.6%
O2191941
11.8%
S1442362
 
7.8%
R1392852
 
7.5%
I1344711
 
7.2%
U1116950
 
6.0%
E977248
 
5.3%
T904232
 
4.9%
P846626
 
4.6%
N754358
 
4.1%
Other values (12)3770363
20.3%
Space Separator
ValueCountFrequency (%)
1235989
100.0%
Other Punctuation
ValueCountFrequency (%)
'16984
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin18570365
93.7%
Common1252973
 
6.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
A3828722
20.6%
O2191941
11.8%
S1442362
 
7.8%
R1392852
 
7.5%
I1344711
 
7.2%
U1116950
 
6.0%
E977248
 
5.3%
T904232
 
4.9%
P846626
 
4.6%
N754358
 
4.1%
Other values (12)3770363
20.3%
Common
ValueCountFrequency (%)
1235989
98.6%
'16984
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII19823338
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A3828722
19.3%
O2191941
11.1%
S1442362
 
7.3%
R1392852
 
7.0%
I1344711
 
6.8%
1235989
 
6.2%
U1116950
 
5.6%
E977248
 
4.9%
T904232
 
4.6%
P846626
 
4.3%
Other values (14)4541705
22.9%

Revenda
Categorical

HIGH CARDINALITY

Distinct9618
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
COMPANHIA BRASILEIRA DE DISTRIBUICAO
 
12424
SETE ESTRELAS COMERCIO DE DERIVADOS DE PETROLEO LTDA
 
12410
CARREFOUR COMERCIO E INDUSTRIA LTDA
 
10273
REDE DE POSTOS SETE ESTRELAS LTDA
 
9489
REDE LK DE POSTOS LTDA
 
4957
Other values (9613)
1982449 

Length

Max length84
Median length72
Mean length30.67956528
Min length8

Characters and Unicode

Total characters62340938
Distinct characters83
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique147 ?
Unique (%)< 0.1%

Sample

1st rowAUTO POSTO SAKAMOTO LTDA
2nd rowREDE GAZOLI AUTO POSTO LTDA.
3rd rowAUTO POSTO CARREIRO LTDA
4th rowAUTO POSTO PROGRESSO DE ADAMANTINA LTDA
5th rowMARCIO A SPOSITO TRANSPORTES LTDA

Common Values

ValueCountFrequency (%)
COMPANHIA BRASILEIRA DE DISTRIBUICAO12424
 
0.6%
SETE ESTRELAS COMERCIO DE DERIVADOS DE PETROLEO LTDA12410
 
0.6%
CARREFOUR COMERCIO E INDUSTRIA LTDA10273
 
0.5%
REDE DE POSTOS SETE ESTRELAS LTDA9489
 
0.5%
REDE LK DE POSTOS LTDA4957
 
0.2%
COMPETRO COMERCIO E DISTRIBUICAO DE DERIVADOS DE PETROLEO LTDA4450
 
0.2%
MAKRO ATACADISTA S.A3234
 
0.2%
FELIMAR AUTO POSTO LTDA.2672
 
0.1%
AUTO POSTO AVENIDA LTDA2493
 
0.1%
COOPERCITRUS COOPERATIVA DE PRODUTORES RURAIS2447
 
0.1%
Other values (9608)1967153
96.8%

Length

2022-09-20T21:32:28.903368image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ltda1877750
18.0%
posto1514455
 
14.5%
auto1223404
 
11.7%
de625387
 
6.0%
192570
 
1.8%
e116411
 
1.1%
comercio109031
 
1.0%
combustiveis91982
 
0.9%
servicos90330
 
0.9%
centro77369
 
0.7%
Other values (6791)4519597
43.3%

Most occurring characters

ValueCountFrequency (%)
8424173
13.5%
O7433787
11.9%
A7089023
11.4%
T6378944
10.2%
S3601155
 
5.8%
E3512286
 
5.6%
D3369019
 
5.4%
L3079403
 
4.9%
I3026277
 
4.9%
R2671459
 
4.3%
Other values (73)13755412
22.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter52115068
83.6%
Space Separator8424173
 
13.5%
Other Punctuation546984
 
0.9%
Lowercase Letter324513
 
0.5%
Other Symbol271784
 
0.4%
Currency Symbol244993
 
0.4%
Decimal Number181202
 
0.3%
Dash Punctuation100600
 
0.2%
Other Letter92904
 
0.1%
Other Number35357
 
0.1%
Other values (2)3360
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O7433787
14.3%
A7089023
13.6%
T6378944
12.2%
S3601155
 
6.9%
E3512286
 
6.7%
D3369019
 
6.5%
L3079403
 
5.9%
I3026277
 
5.8%
R2671459
 
5.1%
P2317998
 
4.4%
Other values (24)9635717
18.5%
Lowercase Letter
ValueCountFrequency (%)
õ246466
75.9%
ó44438
 
13.7%
è31720
 
9.8%
ò1867
 
0.6%
a4
 
< 0.1%
r3
 
< 0.1%
e3
 
< 0.1%
i3
 
< 0.1%
u2
 
< 0.1%
f1
 
< 0.1%
Other values (6)6
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
046544
25.7%
133460
18.5%
227892
15.4%
315900
 
8.8%
515176
 
8.4%
412621
 
7.0%
69375
 
5.2%
97091
 
3.9%
77087
 
3.9%
86056
 
3.3%
Other Punctuation
ValueCountFrequency (%)
.421924
77.1%
&107663
 
19.7%
'6740
 
1.2%
¿4645
 
0.8%
,3921
 
0.7%
"1712
 
0.3%
%181
 
< 0.1%
/156
 
< 0.1%
#42
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
242201
89.1%
15221
 
5.6%
11024
 
4.1%
2300
 
0.8%
968
 
0.4%
69
 
< 0.1%
1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
8424173
100.0%
Currency Symbol
ValueCountFrequency (%)
£244993
100.0%
Dash Punctuation
ValueCountFrequency (%)
-100600
100.0%
Other Letter
ValueCountFrequency (%)
º92904
100.0%
Other Number
ValueCountFrequency (%)
¼35357
100.0%
Modifier Symbol
ValueCountFrequency (%)
`1773
100.0%
Math Symbol
ValueCountFrequency (%)
+1587
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin52284662
83.9%
Common9808453
 
15.7%
Greek247823
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
O7433787
14.2%
A7089023
13.6%
T6378944
12.2%
S3601155
 
6.9%
E3512286
 
6.7%
D3369019
 
6.4%
L3079403
 
5.9%
I3026277
 
5.8%
R2671459
 
5.1%
P2317998
 
4.4%
Other values (40)9805311
18.8%
Common
ValueCountFrequency (%)
8424173
85.9%
.421924
 
4.3%
£244993
 
2.5%
242201
 
2.5%
&107663
 
1.1%
-100600
 
1.0%
046544
 
0.5%
¼35357
 
0.4%
133460
 
0.3%
227892
 
0.3%
Other values (22)123646
 
1.3%
Greek
ValueCountFrequency (%)
Γ247823
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII61112929
98.0%
None956225
 
1.5%
Box Drawing270816
 
0.4%
Block Elements968
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8424173
13.8%
O7433787
12.2%
A7089023
11.6%
T6378944
10.4%
S3601155
 
5.9%
E3512286
 
5.7%
D3369019
 
5.5%
L3079403
 
5.0%
I3026277
 
5.0%
R2671459
 
4.4%
Other values (50)12527403
20.5%
None
ValueCountFrequency (%)
Γ247823
25.9%
õ246466
25.8%
£244993
25.6%
º92904
 
9.7%
ó44438
 
4.6%
¼35357
 
3.7%
è31720
 
3.3%
¿4645
 
0.5%
Ò3264
 
0.3%
ò1867
 
0.2%
Other values (6)2748
 
0.3%
Box Drawing
ValueCountFrequency (%)
242201
89.4%
15221
 
5.6%
11024
 
4.1%
2300
 
0.8%
69
 
< 0.1%
1
 
< 0.1%
Block Elements
ValueCountFrequency (%)
968
100.0%

CNPJ da Revenda
Categorical

HIGH CARDINALITY

Distinct9925
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
58.444.084/0001-32
 
865
44.610.343/0001-43
 
860
66.706.243/0001-58
 
860
01.215.172/0001-45
 
857
48.192.819/0001-24
 
856
Other values (9920)
2027704 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters38608038
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique149 ?
Unique (%)< 0.1%

Sample

1st row 49.051.667/0001-02
2nd row 09.116.143/0001-38
3rd row 55.451.876/0001-46
4th row 52.605.052/0001-95
5th row 54.187.588/0002-44

Common Values

ValueCountFrequency (%)
58.444.084/0001-32865
 
< 0.1%
44.610.343/0001-43860
 
< 0.1%
66.706.243/0001-58860
 
< 0.1%
01.215.172/0001-45857
 
< 0.1%
48.192.819/0001-24856
 
< 0.1%
03.687.679/0001-27856
 
< 0.1%
55.845.366/0001-53850
 
< 0.1%
65.755.688/0001-65844
 
< 0.1%
03.419.315/0001-66840
 
< 0.1%
59.532.275/0001-19839
 
< 0.1%
Other values (9915)2023475
99.6%

Length

2022-09-20T21:32:29.197963image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
58.444.084/0001-32865
 
< 0.1%
66.706.243/0001-58860
 
< 0.1%
44.610.343/0001-43860
 
< 0.1%
01.215.172/0001-45857
 
< 0.1%
48.192.819/0001-24856
 
< 0.1%
03.687.679/0001-27856
 
< 0.1%
55.845.366/0001-53850
 
< 0.1%
65.755.688/0001-65844
 
< 0.1%
03.419.315/0001-66840
 
< 0.1%
59.532.275/0001-19839
 
< 0.1%
Other values (9915)2023475
99.6%

Most occurring characters

ValueCountFrequency (%)
09135989
23.7%
.4064004
10.5%
13828434
9.9%
42221164
 
5.8%
52172231
 
5.6%
2032002
 
5.3%
/2032002
 
5.3%
-2032002
 
5.3%
61961845
 
5.1%
21915645
 
5.0%
Other values (4)7212720
18.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number28448028
73.7%
Other Punctuation6096006
 
15.8%
Space Separator2032002
 
5.3%
Dash Punctuation2032002
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
09135989
32.1%
13828434
13.5%
42221164
 
7.8%
52172231
 
7.6%
61961845
 
6.9%
21915645
 
6.7%
31909265
 
6.7%
91774651
 
6.2%
71772043
 
6.2%
81756761
 
6.2%
Other Punctuation
ValueCountFrequency (%)
.4064004
66.7%
/2032002
33.3%
Space Separator
ValueCountFrequency (%)
2032002
100.0%
Dash Punctuation
ValueCountFrequency (%)
-2032002
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common38608038
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
09135989
23.7%
.4064004
10.5%
13828434
9.9%
42221164
 
5.8%
52172231
 
5.6%
2032002
 
5.3%
/2032002
 
5.3%
-2032002
 
5.3%
61961845
 
5.1%
21915645
 
5.0%
Other values (4)7212720
18.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII38608038
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
09135989
23.7%
.4064004
10.5%
13828434
9.9%
42221164
 
5.8%
52172231
 
5.6%
2032002
 
5.3%
/2032002
 
5.3%
-2032002
 
5.3%
61961845
 
5.1%
21915645
 
5.0%
Other values (4)7212720
18.7%

Nome da Rua
Categorical

HIGH CARDINALITY

Distinct5284
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
AVENIDA BRASIL
 
18323
RODOVIA PRESIDENTE DUTRA
 
10852
AVENIDA PRESIDENTE KENNEDY
 
10359
AVENIDA PRESIDENTE VARGAS
 
10164
AVENIDA INDEPENDENCIA
 
8849
Other values (5279)
1973455 

Length

Max length62
Median length50
Mean length22.74089986
Min length5

Characters and Unicode

Total characters46209554
Distinct characters79
Distinct categories14 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique68 ?
Unique (%)< 0.1%

Sample

1st rowRODOVIA PRESIDENTE DUTRA
2nd rowAVENIDA MARECHAL CASTELO BRANCO
3rd rowAVENIDA CAP JOSE A DE OLIVEIRA
4th rowAVENIDA RIO BRANCO
5th rowAVENIDA RIO BRANCO

Common Values

ValueCountFrequency (%)
AVENIDA BRASIL18323
 
0.9%
RODOVIA PRESIDENTE DUTRA10852
 
0.5%
AVENIDA PRESIDENTE KENNEDY10359
 
0.5%
AVENIDA PRESIDENTE VARGAS10164
 
0.5%
AVENIDA INDEPENDENCIA8849
 
0.4%
AVENIDA WASHINGTON LUIZ8159
 
0.4%
RUA FLORIANO PEIXOTO7471
 
0.4%
AVENIDA RIO BRANCO7151
 
0.4%
AVENIDA TIRADENTES6928
 
0.3%
AVENIDA SANTOS DUMONT6264
 
0.3%
Other values (5274)1937482
95.3%

Length

2022-09-20T21:32:29.512372image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
avenida1074178
 
15.2%
rua728623
 
10.3%
de295010
 
4.2%
rodovia106318
 
1.5%
da77338
 
1.1%
jose75230
 
1.1%
antonio59827
 
0.8%
joao56924
 
0.8%
presidente55059
 
0.8%
do54360
 
0.8%
Other values (3877)4482766
63.4%

Most occurring characters

ValueCountFrequency (%)
A7090114
15.3%
5084452
11.0%
E4090637
 
8.9%
I3601905
 
7.8%
O3416705
 
7.4%
R3414528
 
7.4%
N2879955
 
6.2%
D2764942
 
6.0%
S1889396
 
4.1%
U1689451
 
3.7%
Other values (69)10287469
22.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter40335004
87.3%
Space Separator5084452
 
11.0%
Decimal Number208186
 
0.5%
Lowercase Letter184260
 
0.4%
Other Symbol133066
 
0.3%
Currency Symbol117281
 
0.3%
Other Punctuation97276
 
0.2%
Other Letter20432
 
< 0.1%
Dash Punctuation12791
 
< 0.1%
Close Punctuation5037
 
< 0.1%
Other values (4)11769
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A7090114
17.6%
E4090637
10.1%
I3601905
8.9%
O3416705
8.5%
R3414528
8.5%
N2879955
 
7.1%
D2764942
 
6.9%
S1889396
 
4.7%
U1689451
 
4.2%
V1595497
 
4.0%
Other values (24)7901874
19.6%
Lowercase Letter
ValueCountFrequency (%)
õ116697
63.3%
ó51611
28.0%
è14105
 
7.7%
ò1508
 
0.8%
ç326
 
0.2%
a3
 
< 0.1%
n2
 
< 0.1%
o2
 
< 0.1%
u1
 
< 0.1%
e1
 
< 0.1%
Other values (4)4
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
141552
20.0%
332964
15.8%
231636
15.2%
021433
10.3%
421264
10.2%
914823
 
7.1%
714567
 
7.0%
513859
 
6.7%
611182
 
5.4%
84906
 
2.4%
Other Symbol
ValueCountFrequency (%)
113912
85.6%
11578
 
8.7%
3847
 
2.9%
2256
 
1.7%
1369
 
1.0%
104
 
0.1%
Other Punctuation
ValueCountFrequency (%)
.74514
76.6%
,12866
 
13.2%
/3827
 
3.9%
'3001
 
3.1%
¿2528
 
2.6%
:540
 
0.6%
Space Separator
ValueCountFrequency (%)
5084452
100.0%
Currency Symbol
ValueCountFrequency (%)
£117281
100.0%
Other Letter
ValueCountFrequency (%)
º20432
100.0%
Dash Punctuation
ValueCountFrequency (%)
-12791
100.0%
Close Punctuation
ValueCountFrequency (%)
)5037
100.0%
Open Punctuation
ValueCountFrequency (%)
(5037
100.0%
Other Number
ValueCountFrequency (%)
¼3974
100.0%
Math Symbol
ValueCountFrequency (%)
+1632
100.0%
Modifier Symbol
ValueCountFrequency (%)
`1126
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin40421561
87.5%
Common5669858
 
12.3%
Greek118135
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
A7090114
17.5%
E4090637
10.1%
I3601905
8.9%
O3416705
8.5%
R3414528
8.4%
N2879955
 
7.1%
D2764942
 
6.8%
S1889396
 
4.7%
U1689451
 
4.2%
V1595497
 
3.9%
Other values (38)7988431
19.8%
Common
ValueCountFrequency (%)
5084452
89.7%
£117281
 
2.1%
113912
 
2.0%
.74514
 
1.3%
141552
 
0.7%
332964
 
0.6%
231636
 
0.6%
021433
 
0.4%
421264
 
0.4%
914823
 
0.3%
Other values (20)116027
 
2.0%
Greek
ValueCountFrequency (%)
Γ118135
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII45627117
98.7%
None449371
 
1.0%
Box Drawing131697
 
0.3%
Block Elements1369
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A7090114
15.5%
5084452
11.1%
E4090637
9.0%
I3601905
 
7.9%
O3416705
 
7.5%
R3414528
 
7.5%
N2879955
 
6.3%
D2764942
 
6.1%
S1889396
 
4.1%
U1689451
 
3.7%
Other values (46)9705032
21.3%
None
ValueCountFrequency (%)
Γ118135
26.3%
£117281
26.1%
õ116697
26.0%
ó51611
11.5%
º20432
 
4.5%
è14105
 
3.1%
¼3974
 
0.9%
¿2528
 
0.6%
ò1508
 
0.3%
À1127
 
0.3%
Other values (7)1973
 
0.4%
Box Drawing
ValueCountFrequency (%)
113912
86.5%
11578
 
8.8%
3847
 
2.9%
2256
 
1.7%
104
 
0.1%
Block Elements
ValueCountFrequency (%)
1369
100.0%

Numero Rua
Categorical

HIGH CARDINALITY

Distinct3137
Distinct (%)0.2%
Missing152
Missing (%)< 0.1%
Memory size15.5 MiB
S/N
 
108763
15
 
11559
30
 
9215
300
 
8626
10
 
8555
Other values (3132)
1885132 

Length

Max length15
Median length11
Mean length3.311819278
Min length1

Characters and Unicode

Total characters6729120
Distinct characters42
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24 ?
Unique (%)< 0.1%

Sample

1st rowS/N
2nd row15
3rd row160
4th row764
5th row1625

Common Values

ValueCountFrequency (%)
S/N108763
 
5.4%
1511559
 
0.6%
309215
 
0.5%
3008626
 
0.4%
108555
 
0.4%
SN8326
 
0.4%
257861
 
0.4%
6007671
 
0.4%
207267
 
0.4%
217210
 
0.4%
Other values (3127)1846797
90.9%

Length

2022-09-20T21:32:29.805104image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
s/n109969
 
5.4%
1511559
 
0.6%
309215
 
0.5%
3008626
 
0.4%
108555
 
0.4%
sn8326
 
0.4%
258179
 
0.4%
6007687
 
0.4%
217549
 
0.4%
207267
 
0.4%
Other values (3101)1853428
90.8%

Most occurring characters

ValueCountFrequency (%)
11094054
16.3%
0855712
12.7%
2717234
10.7%
5672921
10.0%
3641447
9.5%
4558498
8.3%
6482382
7.2%
7451592
6.7%
9415748
 
6.2%
8414964
 
6.2%
Other values (32)424568
 
6.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6304552
93.7%
Uppercase Letter248589
 
3.7%
Other Punctuation149976
 
2.2%
Dash Punctuation16489
 
0.2%
Space Separator8878
 
0.1%
Lowercase Letter556
 
< 0.1%
Other Number76
 
< 0.1%
Currency Symbol2
 
< 0.1%
Other Symbol2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m147
26.4%
õ78
14.0%
ò78
14.0%
a67
12.1%
i55
 
9.9%
j32
 
5.8%
u32
 
5.8%
l32
 
5.8%
g12
 
2.2%
o12
 
2.2%
Other values (4)11
 
2.0%
Decimal Number
ValueCountFrequency (%)
11094054
17.4%
0855712
13.6%
2717234
11.4%
5672921
10.7%
3641447
10.2%
4558498
8.9%
6482382
7.7%
7451592
7.2%
9415748
 
6.6%
8414964
 
6.6%
Uppercase Letter
ValueCountFrequency (%)
S121479
48.9%
N120990
48.7%
K2136
 
0.9%
M2044
 
0.8%
B790
 
0.3%
R789
 
0.3%
Γ156
 
0.1%
A88
 
< 0.1%
À78
 
< 0.1%
L39
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/116676
77.8%
,31678
 
21.1%
.1622
 
1.1%
Dash Punctuation
ValueCountFrequency (%)
-16489
100.0%
Space Separator
ValueCountFrequency (%)
8878
100.0%
Other Number
ValueCountFrequency (%)
¼76
100.0%
Currency Symbol
ValueCountFrequency (%)
£2
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common6479975
96.3%
Latin248989
 
3.7%
Greek156
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
S121479
48.8%
N120990
48.6%
K2136
 
0.9%
M2044
 
0.8%
B790
 
0.3%
R789
 
0.3%
m147
 
0.1%
A88
 
< 0.1%
õ78
 
< 0.1%
ò78
 
< 0.1%
Other values (13)370
 
0.1%
Common
ValueCountFrequency (%)
11094054
16.9%
0855712
13.2%
2717234
11.1%
5672921
10.4%
3641447
9.9%
4558498
8.6%
6482382
7.4%
7451592
7.0%
9415748
 
6.4%
8414964
 
6.4%
Other values (8)175423
 
2.7%
Greek
ValueCountFrequency (%)
Γ156
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6728648
> 99.9%
None470
 
< 0.1%
Box Drawing2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11094054
16.3%
0855712
12.7%
2717234
10.7%
5672921
10.0%
3641447
9.5%
4558498
8.3%
6482382
7.2%
7451592
6.7%
9415748
 
6.2%
8414964
 
6.2%
Other values (24)424096
 
6.3%
None
ValueCountFrequency (%)
Γ156
33.2%
õ78
16.6%
ò78
16.6%
À78
16.6%
¼76
16.2%
£2
 
0.4%
ç2
 
0.4%
Box Drawing
ValueCountFrequency (%)
2
100.0%

Complemento
Categorical

HIGH CARDINALITY
MISSING

Distinct747
Distinct (%)0.3%
Missing1741308
Missing (%)85.7%
Memory size15.5 MiB
0
58739 
37737 
TERREO
 
6229
SETOR I
 
4917
A
 
4854
Other values (742)
178218 

Length

Max length113
Median length108
Mean length7.164774643
Min length1

Characters and Unicode

Total characters2082757
Distinct characters59
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)< 0.1%

Sample

1st rowKM 210,5-SENT SP/RJ
2nd rowESQ.R.RACHID KASSOUF
3rd rowKM. 47,5
4th rowKM 50
5th rowKM 40

Common Values

ValueCountFrequency (%)
058739
 
2.9%
37737
 
1.9%
TERREO6229
 
0.3%
SETOR I4917
 
0.2%
A4854
 
0.2%
PARTE3438
 
0.2%
POSTO2569
 
0.1%
-2008
 
0.1%
B1531
 
0.1%
.1280
 
0.1%
Other values (737)167392
 
8.2%
(Missing)1741308
85.7%

Length

2022-09-20T21:32:30.080310image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
km92593
 
17.4%
059518
 
11.2%
17502
 
3.3%
a12154
 
2.3%
lote8435
 
1.6%
parte7183
 
1.3%
terreo6910
 
1.3%
quadra6072
 
1.1%
i5905
 
1.1%
setor5848
 
1.1%
Other values (841)311148
58.3%

Most occurring characters

ValueCountFrequency (%)
428384
20.6%
M118799
 
5.7%
0118574
 
5.7%
A114237
 
5.5%
K94595
 
4.5%
E91334
 
4.4%
O84446
 
4.1%
R77242
 
3.7%
175786
 
3.6%
T68874
 
3.3%
Other values (49)810486
38.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1024370
49.2%
Decimal Number528253
25.4%
Space Separator428384
20.6%
Other Punctuation67320
 
3.2%
Dash Punctuation13563
 
0.7%
Math Symbol11943
 
0.6%
Lowercase Letter4272
 
0.2%
Other Symbol1760
 
0.1%
Currency Symbol1423
 
0.1%
Other Number1090
 
0.1%
Other values (2)379
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M118799
11.6%
A114237
11.2%
K94595
 
9.2%
E91334
 
8.9%
O84446
 
8.2%
R77242
 
7.5%
T68874
 
6.7%
S52351
 
5.1%
I44231
 
4.3%
N35734
 
3.5%
Other values (19)242527
23.7%
Decimal Number
ValueCountFrequency (%)
0118574
22.4%
175786
14.3%
266703
12.6%
550478
9.6%
346582
 
8.8%
444639
 
8.5%
635071
 
6.6%
834589
 
6.5%
732349
 
6.1%
923482
 
4.4%
Other Punctuation
ValueCountFrequency (%)
.23469
34.9%
,22620
33.6%
/18243
27.1%
;1566
 
2.3%
:1388
 
2.1%
'34
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
õ2524
59.1%
ò950
 
22.2%
ó655
 
15.3%
è143
 
3.3%
Other Symbol
ValueCountFrequency (%)
1398
79.4%
198
 
11.2%
164
 
9.3%
Space Separator
ValueCountFrequency (%)
428384
100.0%
Dash Punctuation
ValueCountFrequency (%)
-13563
100.0%
Math Symbol
ValueCountFrequency (%)
+11943
100.0%
Currency Symbol
ValueCountFrequency (%)
£1423
100.0%
Other Number
ValueCountFrequency (%)
¼1090
100.0%
Other Letter
ValueCountFrequency (%)
º202
100.0%
Modifier Symbol
ValueCountFrequency (%)
`177
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1053913
50.6%
Latin1025214
49.2%
Greek3630
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
M118799
11.6%
A114237
11.1%
K94595
 
9.2%
E91334
 
8.9%
O84446
 
8.2%
R77242
 
7.5%
T68874
 
6.7%
S52351
 
5.1%
I44231
 
4.3%
N35734
 
3.5%
Other values (23)243371
23.7%
Common
ValueCountFrequency (%)
428384
40.6%
0118574
 
11.3%
175786
 
7.2%
266703
 
6.3%
550478
 
4.8%
346582
 
4.4%
444639
 
4.2%
635071
 
3.3%
834589
 
3.3%
732349
 
3.1%
Other values (15)120758
 
11.5%
Greek
ValueCountFrequency (%)
Γ3630
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2069071
99.3%
None11926
 
0.6%
Box Drawing1562
 
0.1%
Block Elements198
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
428384
20.7%
M118799
 
5.7%
0118574
 
5.7%
A114237
 
5.5%
K94595
 
4.6%
E91334
 
4.4%
O84446
 
4.1%
R77242
 
3.7%
175786
 
3.7%
T68874
 
3.3%
Other values (35)796800
38.5%
None
ValueCountFrequency (%)
Γ3630
30.4%
õ2524
21.2%
£1423
 
11.9%
À1106
 
9.3%
¼1090
 
9.1%
ò950
 
8.0%
ó655
 
5.5%
º202
 
1.7%
Ú167
 
1.4%
è143
 
1.2%
Box Drawing
ValueCountFrequency (%)
1398
89.5%
164
 
10.5%
Block Elements
ValueCountFrequency (%)
198
100.0%

Bairro
Categorical

HIGH CARDINALITY

Distinct3652
Distinct (%)0.2%
Missing5502
Missing (%)0.3%
Memory size15.5 MiB
CENTRO
373609 
ZONA RURAL
 
13513
JARDIM PAULISTA
 
11156
IPIRANGA
 
9678
BELA VISTA
 
9426
Other values (3647)
1609118 

Length

Max length48
Median length40
Mean length11.96545769
Min length1

Characters and Unicode

Total characters24248000
Distinct characters71
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45 ?
Unique (%)< 0.1%

Sample

1st rowBONSUCESSO
2nd rowVILA JAMIL DE LIMA
3rd rowCENTRO
4th rowCENTRO
5th rowVILA INDUSTRIAL

Common Values

ValueCountFrequency (%)
CENTRO373609
 
18.4%
ZONA RURAL13513
 
0.7%
JARDIM PAULISTA11156
 
0.5%
IPIRANGA9678
 
0.5%
BELA VISTA9426
 
0.5%
VILA NOVA9423
 
0.5%
SANTANA8972
 
0.4%
JARDIM BELA VISTA7266
 
0.4%
SANTO AMARO6893
 
0.3%
JARDIM AMERICA6354
 
0.3%
Other values (3642)1570210
77.3%
(Missing)5502
 
0.3%

Length

2022-09-20T21:32:30.386689image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
centro375653
 
9.9%
vila354119
 
9.3%
jardim326468
 
8.6%
jd81388
 
2.1%
santa72387
 
1.9%
parque64237
 
1.7%
sao58903
 
1.6%
nova56251
 
1.5%
do40504
 
1.1%
vista30400
 
0.8%
Other values (2395)2337226
61.5%

Most occurring characters

ValueCountFrequency (%)
A3561254
14.7%
I2019708
 
8.3%
R1966111
 
8.1%
1776919
 
7.3%
O1749819
 
7.2%
E1607983
 
6.6%
N1340560
 
5.5%
T1092667
 
4.5%
L1036693
 
4.3%
D997375
 
4.1%
Other values (61)7098911
29.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter21674276
89.4%
Space Separator1776919
 
7.3%
Lowercase Letter259840
 
1.1%
Other Symbol213993
 
0.9%
Currency Symbol179816
 
0.7%
Other Punctuation78846
 
0.3%
Other Letter24734
 
0.1%
Other Number18622
 
0.1%
Decimal Number9028
 
< 0.1%
Dash Punctuation4030
 
< 0.1%
Other values (2)7896
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A3561254
16.4%
I2019708
 
9.3%
R1966111
 
9.1%
O1749819
 
8.1%
E1607983
 
7.4%
N1340560
 
6.2%
T1092667
 
5.0%
L1036693
 
4.8%
D997375
 
4.6%
C973006
 
4.5%
Other values (25)5329100
24.6%
Lowercase Letter
ValueCountFrequency (%)
õ176716
68.0%
ó56338
 
21.7%
è26043
 
10.0%
ò737
 
0.3%
r1
 
< 0.1%
a1
 
< 0.1%
c1
 
< 0.1%
e1
 
< 0.1%
l1
 
< 0.1%
i1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
02875
31.8%
11733
19.2%
51301
14.4%
21157
12.8%
3792
 
8.8%
7414
 
4.6%
8399
 
4.4%
4357
 
4.0%
Other Symbol
ValueCountFrequency (%)
175090
81.8%
16932
 
7.9%
15950
 
7.5%
3808
 
1.8%
2030
 
0.9%
183
 
0.1%
Other Punctuation
ValueCountFrequency (%)
.71300
90.4%
¿4116
 
5.2%
/1680
 
2.1%
'1373
 
1.7%
:377
 
0.5%
Space Separator
ValueCountFrequency (%)
1776919
100.0%
Currency Symbol
ValueCountFrequency (%)
£179816
100.0%
Other Letter
ValueCountFrequency (%)
º24734
100.0%
Other Number
ValueCountFrequency (%)
¼18622
100.0%
Dash Punctuation
ValueCountFrequency (%)
-4030
100.0%
Close Punctuation
ValueCountFrequency (%)
)3948
100.0%
Open Punctuation
ValueCountFrequency (%)
(3948
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin21781592
89.8%
Common2289150
 
9.4%
Greek177258
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
A3561254
16.3%
I2019708
 
9.3%
R1966111
 
9.0%
O1749819
 
8.0%
E1607983
 
7.4%
N1340560
 
6.2%
T1092667
 
5.0%
L1036693
 
4.8%
D997375
 
4.6%
C973006
 
4.5%
Other values (35)5436416
25.0%
Common
ValueCountFrequency (%)
1776919
77.6%
£179816
 
7.9%
175090
 
7.6%
.71300
 
3.1%
¼18622
 
0.8%
16932
 
0.7%
15950
 
0.7%
¿4116
 
0.2%
-4030
 
0.2%
)3948
 
0.2%
Other values (15)22427
 
1.0%
Greek
ValueCountFrequency (%)
Γ177258
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII23365680
96.4%
None668327
 
2.8%
Box Drawing210185
 
0.9%
Block Elements3808
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A3561254
15.2%
I2019708
 
8.6%
R1966111
 
8.4%
1776919
 
7.6%
O1749819
 
7.5%
E1607983
 
6.9%
N1340560
 
5.7%
T1092667
 
4.7%
L1036693
 
4.4%
D997375
 
4.3%
Other values (38)6216591
26.6%
None
ValueCountFrequency (%)
£179816
26.9%
Γ177258
26.5%
õ176716
26.4%
ó56338
 
8.4%
è26043
 
3.9%
º24734
 
3.7%
¼18622
 
2.8%
¿4116
 
0.6%
Ò3074
 
0.5%
ò737
 
0.1%
Other values (7)873
 
0.1%
Box Drawing
ValueCountFrequency (%)
175090
83.3%
16932
 
8.1%
15950
 
7.6%
2030
 
1.0%
183
 
0.1%
Block Elements
ValueCountFrequency (%)
3808
100.0%

Cep
Categorical

HIGH CARDINALITY

Distinct5609
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
11680-000
 
11181
13140-000
 
8949
18900-000
 
8941
17400-000
 
8459
17900-000
 
8457
Other values (5604)
1986015 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters18288018
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique50 ?
Unique (%)< 0.1%

Sample

1st row07178-580
2nd row17800-000
3rd row17800-000
4th row17800-000
5th row17800-000

Common Values

ValueCountFrequency (%)
11680-00011181
 
0.6%
13140-0008949
 
0.4%
18900-0008941
 
0.4%
17400-0008459
 
0.4%
17900-0008457
 
0.4%
19400-0008403
 
0.4%
15910-0008306
 
0.4%
15200-0008286
 
0.4%
12940-0008107
 
0.4%
14900-0008091
 
0.4%
Other values (5599)1944822
95.7%

Length

2022-09-20T21:32:30.663352image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
11680-00011181
 
0.6%
13140-0008949
 
0.4%
18900-0008941
 
0.4%
17400-0008459
 
0.4%
17900-0008457
 
0.4%
19400-0008403
 
0.4%
15910-0008306
 
0.4%
15200-0008286
 
0.4%
12940-0008107
 
0.4%
14900-0008091
 
0.4%
Other values (5599)1944822
95.7%

Most occurring characters

ValueCountFrequency (%)
06232824
34.1%
12705551
14.8%
-2032002
 
11.1%
31321571
 
7.2%
21122619
 
6.1%
51017867
 
5.6%
41009631
 
5.5%
7785840
 
4.3%
6772820
 
4.2%
8710202
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number16256016
88.9%
Dash Punctuation2032002
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
06232824
38.3%
12705551
16.6%
31321571
 
8.1%
21122619
 
6.9%
51017867
 
6.3%
41009631
 
6.2%
7785840
 
4.8%
6772820
 
4.8%
8710202
 
4.4%
9577091
 
3.6%
Dash Punctuation
ValueCountFrequency (%)
-2032002
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common18288018
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
06232824
34.1%
12705551
14.8%
-2032002
 
11.1%
31321571
 
7.2%
21122619
 
6.1%
51017867
 
5.6%
41009631
 
5.5%
7785840
 
4.3%
6772820
 
4.2%
8710202
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII18288018
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
06232824
34.1%
12705551
14.8%
-2032002
 
11.1%
31321571
 
7.2%
21122619
 
6.1%
51017867
 
5.6%
41009631
 
5.5%
7785840
 
4.3%
6772820
 
4.2%
8710202
 
3.9%

Produto
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
ETANOL
2032002 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters12192012
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowETANOL
2nd rowETANOL
3rd rowETANOL
4th rowETANOL
5th rowETANOL

Common Values

ValueCountFrequency (%)
ETANOL2032002
100.0%

Length

2022-09-20T21:32:30.879878image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-20T21:32:31.131993image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
etanol2032002
100.0%

Most occurring characters

ValueCountFrequency (%)
E2032002
16.7%
T2032002
16.7%
A2032002
16.7%
N2032002
16.7%
O2032002
16.7%
L2032002
16.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter12192012
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E2032002
16.7%
T2032002
16.7%
A2032002
16.7%
N2032002
16.7%
O2032002
16.7%
L2032002
16.7%

Most occurring scripts

ValueCountFrequency (%)
Latin12192012
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E2032002
16.7%
T2032002
16.7%
A2032002
16.7%
N2032002
16.7%
O2032002
16.7%
L2032002
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII12192012
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E2032002
16.7%
T2032002
16.7%
A2032002
16.7%
N2032002
16.7%
O2032002
16.7%
L2032002
16.7%

Data da Coleta
Categorical

HIGH CARDINALITY

Distinct3691
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
31/01/2005
 
3069
28/02/2007
 
2661
16/11/2004
 
2592
28/03/2005
 
2561
03/01/2005
 
2440
Other values (3686)
2018679 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters20320020
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)< 0.1%

Sample

1st row03/01/2020
2nd row02/01/2020
3rd row02/01/2020
4th row02/01/2020
5th row02/01/2020

Common Values

ValueCountFrequency (%)
31/01/20053069
 
0.2%
28/02/20072661
 
0.1%
16/11/20042592
 
0.1%
28/03/20052561
 
0.1%
03/01/20052440
 
0.1%
08/03/20052387
 
0.1%
07/03/20072338
 
0.1%
18/07/20052322
 
0.1%
11/04/20052316
 
0.1%
21/03/20052306
 
0.1%
Other values (3681)2007010
98.8%

Length

2022-09-20T21:32:31.363287image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
31/01/20053069
 
0.2%
28/02/20072661
 
0.1%
16/11/20042592
 
0.1%
28/03/20052561
 
0.1%
03/01/20052440
 
0.1%
08/03/20052387
 
0.1%
07/03/20072338
 
0.1%
18/07/20052322
 
0.1%
11/04/20052316
 
0.1%
21/03/20052306
 
0.1%
Other values (3681)2007010
98.8%

Most occurring characters

ValueCountFrequency (%)
05582062
27.5%
/4064004
20.0%
23470625
17.1%
12962780
14.6%
5651505
 
3.2%
6631805
 
3.1%
7620033
 
3.1%
4609926
 
3.0%
3603377
 
3.0%
8593340
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number16256016
80.0%
Other Punctuation4064004
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
05582062
34.3%
23470625
21.3%
12962780
18.2%
5651505
 
4.0%
6631805
 
3.9%
7620033
 
3.8%
4609926
 
3.8%
3603377
 
3.7%
8593340
 
3.6%
9530563
 
3.3%
Other Punctuation
ValueCountFrequency (%)
/4064004
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common20320020
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
05582062
27.5%
/4064004
20.0%
23470625
17.1%
12962780
14.6%
5651505
 
3.2%
6631805
 
3.1%
7620033
 
3.1%
4609926
 
3.0%
3603377
 
3.0%
8593340
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII20320020
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
05582062
27.5%
/4064004
20.0%
23470625
17.1%
12962780
14.6%
5651505
 
3.2%
6631805
 
3.1%
7620033
 
3.1%
4609926
 
3.0%
3603377
 
3.0%
8593340
 
2.9%

Valor de Venda
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3314
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.821610942
Minimum0.59
Maximum6.699
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2022-09-20T21:32:31.583989image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.59
5-th percentile0.999
Q11.298
median1.699
Q32.099
95-th percentile2.999
Maximum6.699
Range6.109
Interquartile range (IQR)0.801

Descriptive statistics

Standard deviation0.724623272
Coefficient of variation (CV)0.3977925555
Kurtosis3.71673882
Mean1.821610942
Median Absolute Deviation (MAD)0.4
Skewness1.58819351
Sum3701517.078
Variance0.5250788863
MonotonicityNot monotonic
2022-09-20T21:32:31.729839image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.89984918
 
4.2%
1.29981050
 
4.0%
1.79977609
 
3.8%
1.39967965
 
3.3%
1.99965103
 
3.2%
1.19963921
 
3.1%
1.69949744
 
2.4%
1.49939660
 
2.0%
1.09936743
 
1.8%
2.69934593
 
1.7%
Other values (3304)1430696
70.4%
ValueCountFrequency (%)
0.597
 
< 0.1%
0.59926
 
< 0.1%
0.62
 
< 0.1%
0.6095
 
< 0.1%
0.614
 
< 0.1%
0.6181
 
< 0.1%
0.61966
< 0.1%
0.6222
 
< 0.1%
0.6282
 
< 0.1%
0.62985
< 0.1%
ValueCountFrequency (%)
6.6991
 
< 0.1%
6.2991
 
< 0.1%
6.1995
 
< 0.1%
6.1971
 
< 0.1%
6.0981
 
< 0.1%
6.0472
 
< 0.1%
5.99916
< 0.1%
5.9982
 
< 0.1%
5.9393
 
< 0.1%
5.9173
 
< 0.1%

Valor de Compra
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct33211
Distinct (%)3.1%
Missing965061
Missing (%)47.5%
Infinite0
Infinite (%)0.0%
Mean1.416765039
Minimum0.3398
Maximum3.314
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2022-09-20T21:32:31.883915image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.3398
5-th percentile0.7864
Q11.0118
median1.3392
Q31.6717
95-th percentile2.4186
Maximum3.314
Range2.9742
Interquartile range (IQR)0.6599

Descriptive statistics

Standard deviation0.5018230501
Coefficient of variation (CV)0.3542034397
Kurtosis-0.2372562475
Mean1.416765039
Median Absolute Deviation (MAD)0.3293
Skewness0.7281972133
Sum1511604.708
Variance0.2518263736
MonotonicityNot monotonic
2022-09-20T21:32:32.056281image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.963063
 
0.2%
0.972691
 
0.1%
1.032682
 
0.1%
0.952214
 
0.1%
0.98992189
 
0.1%
0.99992115
 
0.1%
0.97991994
 
0.1%
11787
 
0.1%
1.081740
 
0.1%
1.071690
 
0.1%
Other values (33201)1044776
51.4%
(Missing)965061
47.5%
ValueCountFrequency (%)
0.33981
< 0.1%
0.349511
< 0.1%
0.42711
< 0.1%
0.432041
< 0.1%
0.43461
< 0.1%
0.4361
< 0.1%
0.441
< 0.1%
0.4491
< 0.1%
0.45632
< 0.1%
0.456311
< 0.1%
ValueCountFrequency (%)
3.3142
< 0.1%
3.25142
< 0.1%
3.21262
< 0.1%
3.19451
< 0.1%
3.14722
< 0.1%
3.14042
< 0.1%
3.12431
< 0.1%
3.12211
< 0.1%
3.10122
< 0.1%
3.07542
< 0.1%

Unidade de Medida
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
R$ / litro
2032002 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters20320020
Distinct characters9
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowR$ / litro
2nd rowR$ / litro
3rd rowR$ / litro
4th rowR$ / litro
5th rowR$ / litro

Common Values

ValueCountFrequency (%)
R$ / litro2032002
100.0%

Length

2022-09-20T21:32:32.211674image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-20T21:32:32.333067image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
r2032002
33.3%
2032002
33.3%
litro2032002
33.3%

Most occurring characters

ValueCountFrequency (%)
4064004
20.0%
R2032002
10.0%
$2032002
10.0%
/2032002
10.0%
l2032002
10.0%
i2032002
10.0%
t2032002
10.0%
r2032002
10.0%
o2032002
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10160010
50.0%
Space Separator4064004
 
20.0%
Uppercase Letter2032002
 
10.0%
Currency Symbol2032002
 
10.0%
Other Punctuation2032002
 
10.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l2032002
20.0%
i2032002
20.0%
t2032002
20.0%
r2032002
20.0%
o2032002
20.0%
Space Separator
ValueCountFrequency (%)
4064004
100.0%
Uppercase Letter
ValueCountFrequency (%)
R2032002
100.0%
Currency Symbol
ValueCountFrequency (%)
$2032002
100.0%
Other Punctuation
ValueCountFrequency (%)
/2032002
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12192012
60.0%
Common8128008
40.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
R2032002
16.7%
l2032002
16.7%
i2032002
16.7%
t2032002
16.7%
r2032002
16.7%
o2032002
16.7%
Common
ValueCountFrequency (%)
4064004
50.0%
$2032002
25.0%
/2032002
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII20320020
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4064004
20.0%
R2032002
10.0%
$2032002
10.0%
/2032002
10.0%
l2032002
10.0%
i2032002
10.0%
t2032002
10.0%
r2032002
10.0%
o2032002
10.0%

Bandeira
Categorical

HIGH CARDINALITY

Distinct142
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
BRANCA
849055 
PETROBRAS DISTRIBUIDORA S.A.
338909 
RAIZEN
264267 
IPIRANGA
252337 
COSAN LUBRIFICANTES
99969 
Other values (137)
227465 

Length

Max length28
Median length6
Mean length10.68584972
Min length2

Characters and Unicode

Total characters21713668
Distinct characters38
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowPETROBRAS DISTRIBUIDORA S.A.
2nd rowRAIZEN
3rd rowBRANCA
4th rowPETROBRAS DISTRIBUIDORA S.A.
5th rowIPIRANGA

Common Values

ValueCountFrequency (%)
BRANCA849055
41.8%
PETROBRAS DISTRIBUIDORA S.A.338909
 
16.7%
RAIZEN264267
 
13.0%
IPIRANGA252337
 
12.4%
COSAN LUBRIFICANTES99969
 
4.9%
CBPI94713
 
4.7%
ALESAT32705
 
1.6%
ALE COMBUSTΓõ£├¼VEIS16168
 
0.8%
PETROSUL13927
 
0.7%
LIQUIGΓõ£├╝S11771
 
0.6%
Other values (132)58181
 
2.9%

Length

2022-09-20T21:32:32.448174image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
branca849055
29.8%
distribuidora341323
12.0%
petrobras338909
 
11.9%
s.a338909
 
11.9%
raizen264267
 
9.3%
ipiranga252337
 
8.9%
cosan101719
 
3.6%
lubrificantes99969
 
3.5%
cbpi94713
 
3.3%
alesat32705
 
1.1%
Other values (155)135387
 
4.8%

Most occurring characters

ValueCountFrequency (%)
A3818500
17.6%
R2893780
13.3%
I2156796
9.9%
B1753068
 
8.1%
N1587053
 
7.3%
S1331735
 
6.1%
C1173062
 
5.4%
T866574
 
4.0%
O833527
 
3.8%
E831129
 
3.8%
Other values (28)4468444
20.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter20087714
92.5%
Space Separator817851
 
3.8%
Other Punctuation691965
 
3.2%
Other Symbol41113
 
0.2%
Lowercase Letter28891
 
0.1%
Currency Symbol28775
 
0.1%
Other Number16285
 
0.1%
Modifier Symbol1074
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A3818500
19.0%
R2893780
14.4%
I2156796
10.7%
B1753068
8.7%
N1587053
7.9%
S1331735
 
6.6%
C1173062
 
5.8%
T866574
 
4.3%
O833527
 
4.1%
E831129
 
4.1%
Other values (17)2842490
14.2%
Other Symbol
ValueCountFrequency (%)
28775
70.0%
11773
28.6%
565
 
1.4%
Other Punctuation
ValueCountFrequency (%)
.691952
> 99.9%
'13
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
õ28775
99.6%
ó116
 
0.4%
Space Separator
ValueCountFrequency (%)
817851
100.0%
Currency Symbol
ValueCountFrequency (%)
£28775
100.0%
Other Number
ValueCountFrequency (%)
¼16285
100.0%
Modifier Symbol
ValueCountFrequency (%)
`1074
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin20087830
92.5%
Common1597063
 
7.4%
Greek28775
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A3818500
19.0%
R2893780
14.4%
I2156796
10.7%
B1753068
8.7%
N1587053
7.9%
S1331735
 
6.6%
C1173062
 
5.8%
T866574
 
4.3%
O833527
 
4.1%
E831129
 
4.1%
Other values (18)2842606
14.2%
Common
ValueCountFrequency (%)
817851
51.2%
.691952
43.3%
£28775
 
1.8%
28775
 
1.8%
¼16285
 
1.0%
11773
 
0.7%
`1074
 
0.1%
565
 
< 0.1%
'13
 
< 0.1%
Greek
ValueCountFrequency (%)
Γ28775
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII21569793
99.3%
None102762
 
0.5%
Box Drawing41113
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A3818500
17.7%
R2893780
13.4%
I2156796
10.0%
B1753068
8.1%
N1587053
 
7.4%
S1331735
 
6.2%
C1173062
 
5.4%
T866574
 
4.0%
O833527
 
3.9%
E831129
 
3.9%
Other values (19)4324569
20.0%
None
ValueCountFrequency (%)
Γ28775
28.0%
õ28775
28.0%
£28775
28.0%
¼16285
15.8%
ó116
 
0.1%
Ò36
 
< 0.1%
Box Drawing
ValueCountFrequency (%)
28775
70.0%
11773
28.6%
565
 
1.4%

Interactions

2022-09-20T21:31:38.957630image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T21:31:37.938388image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T21:31:39.437858image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T21:31:38.401746image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-09-20T21:32:32.554021image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-20T21:32:32.680004image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-20T21:32:32.923521image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-20T21:32:33.045156image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-20T21:32:33.175903image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-20T21:31:43.373482image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-20T21:31:49.718068image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-20T21:32:15.300392image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-20T21:32:23.413413image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

SemestreΓêÒΓòùΓõÉRegiao - SiglaEstado - SiglaMunicipioRevendaCNPJ da RevendaNome da RuaNumero RuaComplementoBairroCepProdutoData da ColetaValor de VendaValor de CompraUnidade de MedidaBandeira
02020-01SESPGUARULHOSAUTO POSTO SAKAMOTO LTDA49.051.667/0001-02RODOVIA PRESIDENTE DUTRAS/NKM 210,5-SENT SP/RJBONSUCESSO07178-580ETANOL03/01/20203.199NaNR$ / litroPETROBRAS DISTRIBUIDORA S.A.
12020-01SESPADAMANTINAREDE GAZOLI AUTO POSTO LTDA.09.116.143/0001-38AVENIDA MARECHAL CASTELO BRANCO15NaNVILA JAMIL DE LIMA17800-000ETANOL02/01/20202.790NaNR$ / litroRAIZEN
22020-01SESPADAMANTINAAUTO POSTO CARREIRO LTDA55.451.876/0001-46AVENIDA CAP JOSE A DE OLIVEIRA160NaNCENTRO17800-000ETANOL02/01/20202.8402.5895R$ / litroBRANCA
32020-01SESPADAMANTINAAUTO POSTO PROGRESSO DE ADAMANTINA LTDA52.605.052/0001-95AVENIDA RIO BRANCO764NaNCENTRO17800-000ETANOL02/01/20202.849NaNR$ / litroPETROBRAS DISTRIBUIDORA S.A.
42020-01SESPADAMANTINAMARCIO A SPOSITO TRANSPORTES LTDA54.187.588/0002-44AVENIDA RIO BRANCO1625NaNVILA INDUSTRIAL17800-000ETANOL02/01/20202.730NaNR$ / litroIPIRANGA
52020-01SESPADAMANTINAMAVESA MATUOKA VEICULOS LTDA43.001.569/0002-65AVENIDA RIO BRANCO600NaNCENTRO17800-000ETANOL02/01/20202.7902.6013R$ / litroIPIRANGA
62020-01SESPAMPAROAUTO POSTO DBV LTDA09.371.227/0001-18AVENIDA ANESIO GUIDI344ESQ.R.RACHID KASSOUFDISTRITO TRES PONTES13909-000ETANOL02/01/20203.0992.6887R$ / litroPETROBRAS DISTRIBUIDORA S.A.
72020-01SESPAMPAROAUTO POSTO PORTAL DAS AGUAS LTDA08.772.232/0001-70AVENIDA WALDYR BEIRA182NaNFIGUEIRA13904-452ETANOL02/01/20202.999NaNR$ / litroPETROBRAS DISTRIBUIDORA S.A.
82020-01SESPAMPAROJ M ANDRETA & CIA LTDA48.827.125/0001-16AVENIDA BERNARDINO DE CAMPOS535NaNCENTRO13900-400ETANOL02/01/20202.9902.6836R$ / litroIPIRANGA
92020-01SESPAMPAROJ M ANDRETA & CIA LTDA48.827.125/0002-05RUA BENTA MARIA DE BARROS181NaNARCADAS13908-000ETANOL02/01/20203.0902.7627R$ / litroIPIRANGA

Last rows

SemestreΓêÒΓòùΓõÉRegiao - SiglaEstado - SiglaMunicipioRevendaCNPJ da RevendaNome da RuaNumero RuaComplementoBairroCepProdutoData da ColetaValor de VendaValor de CompraUnidade de MedidaBandeira
20319922016-01SESPCARAGUATATUBAAUTO POSTO MORRO SANTO ANTONIO LTDA15.753.624/0001-57AVENIDA MARGINAL6155NaNMASSAGUACU11677-000ETANOL27/06/20162.6942.1676R$ / litroIPIRANGA
20319932016-01SESPGUARULHOSCENTRO AUTOMOTIVO GUARUMON LTDA18.879.915/0001-84RUA OCTAVIO BRAGA DE MESQUITAS/NNaNJARDIM IPANEMA07140-020ETANOL27/06/20162.1991.9313R$ / litroRAIZEN
20319942016-01SESPPOAAUTO POSTO PADRE EUSTAQUIO LTDA19.644.825/0001-77AVENIDA VINTE E SEIS DE MARCO16NaNCENTRO08562-140ETANOL27/06/20162.0991.8499R$ / litroBRANCA
20319952016-01SESPSAO PAULOAUTO POSTO TREVO DA SORTE LTDA18.765.781/0001-70RUA IBITIRAMA704NaNVILA PRUDENTE03133-100ETANOL28/06/20162.1991.9871R$ / litroIPIRANGA
20319962016-01SESPITANHAEMAUTO POSTO BELAS ARTES III LTDA19.713.605/0001-58RUA NESTOR LEAL49ESQ. COM GENTIL PEREZCENTRO11740-000ETANOL27/06/20162.6492.1832R$ / litroIPIRANGA
20319972016-01SESPBIRIGUICOMERCIAL CURI PANDINI LTDA - EPP04.238.771/0006-87AV. YOUSSEF ISMAIL MANSOUR - ZE TURCO169NaNJARDIM ALTO DOS SILVARES16202-484ETANOL27/06/20161.999NaNR$ / litroBRANCA
20319982016-01SESPJOSE BONIFACIOAUTO POSTO TMJ - JOSE BONIFACIO LTDA.18.921.535/0001-60AVENIDA JOAQUIM MOREIRA DA SILVA2875NaNJARDIM PANORAMA15200-000ETANOL27/06/20161.9891.8010R$ / litroBRANCA
20319992016-01SESPFRANCAAUTO POSTO DISTRITO BEIRA RIO LTDA07.236.625/0001-04AVENIDA DOUTOR SEVERINO TOSTES MEIRELLES2,02NaNDISTRITO INDUSTRIAL14406-004ETANOL27/06/20162.395NaNR$ / litroPETROBRAS DISTRIBUIDORA S.A.
20320002016-01SESPBRAGANCA PAULISTAUSSEN ALI CHAHIME AUTO POSTO EIRELI06.107.661/0001-05AVENIDA DOS IMIGRANTES4133NaNMATADOURO12910-341ETANOL27/06/20162.299NaNR$ / litroBRANCA
20320012016-01SESPSANTO ANDREPATTO ROSA AUTO POSTO LTDA19.842.325/0001-40AVENIDA UTINGA865NaNVILA METALΓõ£├£RGICA09220-611ETANOL27/06/20162.1971.8099R$ / litroBRANCA

Duplicate rows

Most frequently occurring

SemestreΓêÒΓòùΓõÉRegiao - SiglaEstado - SiglaMunicipioRevendaCNPJ da RevendaNome da RuaNumero RuaComplementoBairroCepProdutoData da ColetaValor de VendaValor de CompraUnidade de MedidaBandeira# duplicates
02005-01SESPMOGI MIRIMAUTO POSTO GUAΓõ£├ºU MIRIM LTDA58.861.238/0001-91RUA PADRE ROQUE2348NaNCENTRO13800-207ETANOL02/05/20051.130NaNR$ / litroBRANCA5
12005-01SESPMOGI MIRIMAUTO POSTO GUAΓõ£├ºU MIRIM LTDA58.861.238/0001-91RUA PADRE ROQUE2348NaNCENTRO13800-207ETANOL06/05/20051.130NaNR$ / litroBRANCA5
22005-01SESPSERTAOZINHOROSAC COMERCIO DE DERIV DE PETROLEO LTD01.139.257/0001-91AVENIDA ANTONIO PASCHOAL69NOVA SERTAOZINHO14160-000ETANOL03/06/20050.9990.80083R$ / litroCBPI5
32005-01SESPSERTAOZINHOROSAC COMERCIO DE DERIV DE PETROLEO LTD01.139.257/0001-91AVENIDA ANTONIO PASCHOAL69NOVA SERTAOZINHO14160-000ETANOL30/05/20050.9990.80083R$ / litroCBPI5
42005-01SESPVOTUPORANGAVITORIA COMERCIO DE COMBUSTIVEIS DE VOTUPORANGA LTDA02.472.382/0001-81AVENIDA BRASIL4810NaNJARDIM SAO JUDAS TADEU15500-051ETANOL03/06/20050.9990.71997R$ / litroBRANCA4
52005-01SESPVOTUPORANGAVITORIA COMERCIO DE COMBUSTIVEIS DE VOTUPORANGA LTDA02.472.382/0001-81AVENIDA BRASIL4810NaNJARDIM SAO JUDAS TADEU15500-051ETANOL30/05/20050.9990.71997R$ / litroBRANCA4